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Abstract — We describe a family of highly efficient codes for 
cryptographic purposes and dedicated algorithms for their ma- 
nipulation. Our proposal is especially tailored for highly con- 
strained platforms, and surpasses certain conventional and post- 
quantum proposals (like RSA and NTRU, respectively) according 
to most if not all efficiency metrics. 



Index Terms- 
rection 



-Algorithms, Cryptography, Decoding, Error cor- 




I. Introduction 

UANTUM computers, should they become a technologi- 
cal reality, will pose a threat to public-key cryptosystems 
based on certain intractability assumptions, like the integer 
factorization problem, or IFP (like RSA), and the discrete 
logarithm problem, or DLP (like DifRe-Hellman or DSA, in 
their elliptic curve version or otherwise). To face this scenario, 
several cryptosystems have been proposed that apparently 
resist attacks mounted with the help of quantum comput- 
ers. The security of these so-called post-quantum cryptosys- 
tems [8 1 stems from quite distinct computational intractability 
assumptions. Such schemes are not necessarily new — for 
instance, cryptosystems based on coding theory (specifically, 
on the intractability of the syndrome decoding problem, or 
SDP) are known for nearly as long as the very concept of 
asymmetric cryptography itself, though they have only recently 
been attracting renewed interest. 

However, being quantum-resistant is not the only interesting 
feature of many post-quantum proposals — some of them 
are equally remarkable because of their improved efficiency 
and simplicity for certain types of applications relatively 
to conventional schemes. Thus, schemes based on the SDP 
entirely avoid the multiprecision integer arithmetic typically 
needed by IFP or DLP cryptosystems, and their computational 
cost is usually a few orders of complexity smaller than those 
systems, reaching 0(n) instead of 0{n 2 ) or 0(n y ) which 
are commonplace in pre-quantum schemes. This indicates 
that post-quantum alternatives may have advantages even in 
situations where quantum attacks are not the main concern, 
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and justifies the investigation on how advantageous they can 
be. 

Particularly interesting scenarios where such post-quantum 
schemes may have a positive impact are wireless sensor 
networks [2], [35 1 and the so-called "Internet of Things," 
in which a wide range of devices are interconnected, from 
the most powerful clustered servers to embedded systems 
with extremely limited processing resources, storage, band- 
width occupation and power consumption, including micro- 
controllers ||25l . 11431 and dedicated hardware ll27l . 

One of the leading families of post-quantum cryptographic 
schemes is that of code-based cryptosystems [28], 11341 . In 
contrast with the form in which these systems were originally 
proposed, where key sizes were typically large, modern ap- 
proaches do offer far more space-efficient parameters, being 
fairly practical on general-purpose platforms. On can thus 
ask whether such schemes are suitable for highly constrained 
platforms as well. 

Low-density parity-check (LDPC) codes and their quasi- 
cyclic variants (QC-LDPC) have been proposed for crypto- 
graphic applications 0, g), 0, @, Q, HH, ED, EQ 
although in a form still unsuitable for constrained plat- 
forms. Recently, quasi-cyclic moderate-density parity-check 
(QC-MDPC) codes have been designed to provide strong 
security assurances for McEliece-style cryptosystems lt30Tl - 
Such codes are arguably ideal for modem general-purpose 
platforms, matching or surpassing the processing efficiency 
of conventional cryptosystems. However, no assessment of 
their suitability for constrained platform has been made, 
and indeed the traditional bit-flipping and belief-propagation 
decoding methods, even though they are quite processing- 
efficient, appear at first glance unsuitable for an Internet-of- 
Things scenario due to their considerable storage requirements. 

A. Our Results 

Our contributions in this paper are twofold: 
. On the one hand, a family of linear error-correcting codes 
(so-called CS-MDPC codes) that are highly efficient for 
cryptographic applications in terms of reduced per- key 
and per-message bandwidth occupation; 
. On the other hand, an efficient decoder for that family of 
codes that is especially tailored for (though not restricted 
to) constrained platforms. 
Specifically, we show how to obtain code-based cryptosystems 
where the public keys and the space overhead incurred for 
each cryptogram are comparable in size to, or even smaller 
than, the corresponding values for the RSA cryptosystem at 



2 



practical security levels. A careful selection of design features 
for the key generation, encoding, and decoding algorithms lead 
to very short processing times, and executable code size in 
software or area occupation in hardware (and thus potentially 
also energy consumption) tend to be considerably smaller than 
what can be attained with RSA or elliptic curve cryptosystems. 
Our proposed variant of the bit flipping decoding technique 
needs only (9(1) ancillary storage, in comparison with 0(n) 
(where n in the code length) as in previous variants of that 
technique. 

B. Organization of the Paper 

The remainder of this document is organized as follows. 
We provide theoretical preliminaries in Section |TT] including 
LDPC and MDPC codes, the hard decision decoding method, 
and code-based cryptosystems. We describe the new family 
of codes and assess its security properties in Section [ill] In 
Section[lV]we outline our proposed techniques to deploy code- 
based cryptosystems on embedded platforms, in particular an 
efficient bit-flipping decoder that takes only 0(1) ancillary 
storage instead of the usual 0(n) requirements. We illus- 
trate some suggested parameters for typical security levels 
in Section [V] and assess the overall results of our proposal 
experimentally in Section |VT] We conclude in Section IVIII 



II. Preliminaries 



A. General notation 



Matrix and vector indices will be numbered from through- 
out this paper, unless otherwise stated. Let p be a prime and 
let q = p'" for some m > 0. The finite field of q elements is 
K, we denote by cir(/z) the circulant 
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B. Error Correcting Codes 

A (binary) linear [n, k] error-correcting code 'if is a subspace 
of ¥" 2 of dimension k. Such a code is specified by either a 
generator matrix G e F**" such that ^ = {uG e Ej \ u e E}}, 
or else by a parity-check matrix H e E. x " such that c € = {v e 
0' } where r = n — k. 
We will be particularly interested in quasi-cyclic codes, 
namely, codes that admit a parity-check matrix consisting of 
«o horizontally joined circulant square blocks of size r x r. 
Thus: 

H = [cir(fto) I cir(fti) | ■ • • | cir(/z„ _i)], 

where hi e E, < i < «o- The representation advantages of 
such codes are obvious, since H can be compactly stored as 
«o sequences of r bits each. 

The syndrome decoding problem (SDP) consists of comput- 
ing an error pattern eeF" given a parity-check matrix H e 
K x " for the underlying code, and a syndrome s - eH T £ E,. In 
general the SDP is NP-hard, but sometimes the knowledge of 
certain structural code properties makes this problem solvable 
in polynomial time. 



C. LDPC codes 

LDPC codes were invented by Robert Gallager ll20l and are 
linear codes obtained from sparse bipartite graphs. Suppose 
that ^ is a graph with n left nodes (called message nodes) 
and r right nodes (called check nodes). The graph gives rise 
to a linear code of block length n and dimension at least n-r 
in the following way: The n coordinates of the codewords are 
associated with the n message nodes. The codewords are those 
vectors (c\,... ,c„) such that for all check nodes the sum of 
the neighboring positions among the message nodes is zero. 

The graph representation is analogous to a matrix represen- 
tation by looking at the adjacency matrix of the graph: let H 
be a binary r x n-matrix in which the entry (i, j) is 1 if and 
only if the z'-th check node is connected to the y'-th message 
node in the graph. Then the LDPC code defined by the graph 
is the set of vectors c = (c\,..,,c n ) such that H ■ c T = 0. 
The matrix H is called a parity check matrix for the code. 
Conversely, any binary r x n matrix gives rise to a bipartite 
graph between n message and r check nodes, and the code 
defined as the null space of H is precisely the code associated 
to this graph. Therefore, any linear code has a representation 
as a code associated to a bipartite graph (note that this graph is 
not uniquely defined by the code). However, not every binary 
linear code has a representation by a sparse bipartite graph. 
If it does, then the code is called a low-density parity-check 
(LDPC) code. 

An important subclass of LDPC codes that feature encoding 
advantages over other codes of the same class is that of quasi- 
cyclic low-density parity-check (QC-LDPC) codes fTJl, Boll . 
In general, an [n, k] -QC-LDPC code satisfies n = nob and 
k = kob (thus also r = rob) for some b, no, ko (and ro), and 
admits a parity-check matrix consisting of no x ro blocks of 
b x b sparse circulant submatrices. Of particular importance 
is the case where b — r (and hence ro = 1 and ko = no - 1), 
since a systematic parity-check matrix for this code is entirely 
defined by the first row of each r x r block. We say that the 
parity-check matrix is in circulant form. 

D. QC-MDPC codes 

A cryptographically interesting subclass of the QC-LDPC 
family is that of quasi-cyclic moderate-density parity-check 
(QC-MDPC) codes QUI 

QC-MDPC codes in this sense are an entirely distinct class 
from a family of algebraic codes also known as 'MDPC and 
designed by Ouzan and Be'ery [36], despite the name clash. 
The goal set forth in the latter approach is to obtain high- 
rate codes of short to moderate length whose duals contain 
known intermediate-weight words (and thus admit parity- 
check matrices of moderate density, hence the 'MDPC name), 
but still have a good error correction capability in comparison 
with algebraic codes like BCH with similar length and rate. 
Examples from [36| indicate that typical densities are in the 
range 17% to 28% of the code length. Thus they are indeed 
intermediate between usual LDPC codes and general ones, 
but the density is too high for conventional LDPC decoding 
techniques, to the effect that those codes are not classical 
Gallager codes in the sense that the density of their dual 
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codes puts them beyond the capability of decoding techniques 
like plain belief propagation and bit flipping, and especially 
tailored decoders must thus be adopted. 

By contrast, QC-MDPC codes in the sense of Misoczki et 
al. are oriented toward cryptographic purposes, with densities 
close enough to LDPC codes as to enable decoding by Gal- 
lager's simpler (and arguably more efficient) belief propagation 
and bit flipping methods, yet dense enough to prevent attacks 
based on the presence of too sparse words in the dual code like 
the Stern attack ||39l and variants, without loosing too much 
of the error correcting capability so as to keep information -set 
decoding attacks J5), IflOl infeasible as well. Furthermore, to 
prevent structural attacks as proposed by Faugere et al. [18 | 
and by Leander and Gauthier [42|, cryptographically-oriented 
codes must remain as unstructured as possible except for 
the hidden trapdoor that enables private decoding and, in the 
case of quasi-cyclic codes, external symmetries that allow for 
efficient implementation. Finally, the very circulant symmetry 
might introduce weaknesses as pointed out by Sendrier [38 1, 
but these induce only a polynomial (specifically, linear) gain in 
attack efficiency, and a small adjustment in parameters copes 
with it entirely. Typical densities in this case are in range 0.4% 
to 0.9% of the code length, one order of magnitude above 
LDPC codes but well below the published MDPC range above, 
and certainly within the realm of Gallager codes. Construction 
is also as random as possible, merely keeping the desired 
density and circulant geometry, and code lengths are much 
larger than typical MDPC values. 

E. Gallager' s Hard Decision (Bit Flipping) Decoding Method 

We briefly recapitulate Gallager's hard decision decod- 
ing algorithm, closely following the very concise and clear 
description by Huffman and Pless ll22l . This will provide 
the basis for the efficient variant we propose for embedded 
platforms. 

Assume that the codeword is encoded with a binary LDPC 
code for transmission, and the vector c is received. In the 
computation of the syndrome s = cH T , each received bit of 
c affects at most d v components of that syndrome. If only 
the j'-th bit of c contains an error, then the corresponding d v 
components s, of s will equal 1, indicating the parity check 
equations that are not satisfied. Even if there are some other 
bits in error among those that contribute to computation of s,-, 
one expects that several of the d v components of s will equal 
1. This is the basis of Gallager's hard decision decoding, or 
bit-flipping, algorithm. 

1) Compute cH T and determine the unsatisfied parity checks 
(namely, the parity checks where the components of cFF 
equal 1). 

2) For each of the n bits, compute the number of unsatisfied 
parity checks involving that bit. 

3) Flip the bits of c that are involved in the largest number 
of unsatisfied parity checks. 

4) Repeat steps Q] [2] and [3] until either cH T = 0, in which 
case c has been successfully decoded, or until a certain 
bound in the number of iterations is reached, in which 
case decoding of the received vector has failed. 



The bit-flipping algorithm is not the most powerful decod- 
ing method for LDPC codes; indeed, the belief propagation 
technique lEUl . 11221 is known to exceed its error correc- 
tion capability. However, belief-propagation decoders involve 
computing ever more refined probabilities that each bit of 
the received word c contains an error, thus incurring floating 
point arithmetic or suitable high-precision approximations and 
computationally expensive algorithms. In a scenario where the 
number of errors is fixed and known in advance, as is the case 
of cryptographic applications, parameters can be designed so 
that the more powerful but also more complex and expensive 
belief propagation methods are not necessary for decoding. 

We therefore focus on the problem of designing an opti- 
mized variant of bit-flipping decoding for highly constrained 
platforms. Specifically, such methods still suffer from the 
drawback of requiring a large amount of ancillary memory for 
counters: if each column of H has Hamming weight d v , step 
|2] requires (LlgtAJ + 1) bits to store the number of unsatisfied 
parity-checks for each of the n bits of c, hence n(\\gd v \ + 1) 
bits overall. Besides, steps [2] and [3] involve a loop of length n 
each, introducing processing inefficiency. We will show how 
to avoid these drawbacks in Section IIV-CI 

F. McEliece and Niederreiter encryption 

The McEliece encryption scheme was proposed by R. 
McEliece [28| in 1978. In that scheme, the public key is a 
generator matrix for a certain code whose decoder is taken to 
be the private key. An equivalent scheme using a parity-check 
matrix as public key was proposed by H. Niederreiter in 1986 
[ 34 1 . We briefly review these schemes, which consist of three 
algorithms (KeyGen, Encrypt, Decrypt) each. 

1) McEliece: 

• KeyGen: Select a binary f-error correcting [n,k]-code ^ 
with a decoding trapdoor 9 and a kxn generator matrix 
G in systematic form. The public key is (G, t), and the 
private key is the decoding trapdoor @. 

• Encrypt: To encrypt a plaintext m e¥ k 2 into a cryptogram 
c e F^, select a uniformly random error pattern e e 
and weight f, and compute c <— m ■ G + e. 

• Decrypt: To decrypt c e K, apply the decoding trapdoor 
2 to correct the t errors in c (thus finding the error pattern 
c € Fj of weight f), then extract m e from the first k 
columns of c - e. 

2) Niederreiter: 

• KeyGen: Select a binary f-error correcting [«, £]-code c /f 
with a decoding trapdoor @ and an r x n parity-check 
matrix H in systematic form, where r — n-k. The public 
key is (H, t), and the private key is the decoding trapdoor 
9. 

. Encrypt: To encrypt a plaintext m e Z/mZ into a 
cryptogram c e F~, encode m into a vector e e F" of 
weight t via some conventional permutation unranking 
method, and compute c <— e ■ H T . 

• Decrypt: To decrypt c e R, apply the decoding trapdoor 
9 to the syndrome c (thus finding the corresponding 
vector e e Fj of weight f), then decode m e Z/nZ from 
e via permutation ranking. 
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Although the security of these two schemes is equivalent, 
Niederreiter is the more efficient IfTTII . being therefore the 
method of choice for constrained platforms. 

III. An efficient family of MDPC codes 

The QC-MDPC codes |[3"0ll are arguably among the most 
efficient settings for code-based cryptosystems. However, QC- 
MDPC parameters for practical security levels, specifically 
those corresponding to a cost between a legacy-level 2 80 and 
a top-level 2 256 steps to mount the best possible attacks, yield 
key and ciphertext space overheads well above the correspond- 
ing values achievable with the RSA cryptosystem, which is 
perhaps the most widely deployed asymmetric cryptography 
scheme today, and constitutes for that reason a practical upper 
bound for the corresponding parameters in other cryptographic 
schemes. Therefore one cannot claim that those codes are 
generically suitable for constrained platforms. 

It turns out we can do better than that with a proper subset 
of QC-MDPC codes. To define it, we now introduce a class 
of matrices that admit a space-efficient representation: 

Definition 1. Given a ring S? and an integer p, the set of 
cyclosymmetric matrices of order p over S% is the set ofA p (S?) 
of square p X p matrices over Si that are both circulant and 
symmetric. 

Cyclosymmetric matrices constitute a subring of the ring 
(%p x p of p x p matrices over M, which can be seen by the 
fact that the identity matrix is cyclosymmetric and that the 
product of symmetric matrices is itself symmetric iff the fac- 
tors commute, and indeed circulant matrices are commutative: 
(AB) T = B T A T = BA = AB (all other properties are trivial). We 
call this the cyclosymmetric ring of order p over Si. 

A cyclosymmetric ring can be itself defined over an- 
other cyclosymmetric ring and so on recursively, yielding 
a multilayered cyclosymmetric ring ultimately defined over 
a non-cyclosymmetric ring. This ring tower is written as 
A Pl (. . . A Pl (S?o) for successively embedded rings of orders 
pi,...,pi. Let lyr(S?) denote the number of layers of a 
multilayered cyclosymmetric ring 8%. We define the number of 
layers of a non-cyclosymmetric ring Sio (e.g. a finite field) to 
be lyr(^o) = 0, and then recursively h/r(A p (S?')) = \yr(S?') + \. 
Thus, lyr(A Pl (...A Pi (^ )) = i. 

The interest in a cyclosymmetric ring resides in the fact 
that elements of A p (S$) can be represented as a sequence of 
Lp/2J + 1 elements from asymptotically occupying only half 
the space required by a merely circulant matrix of order p over 
Si. To see this, just note that a circulant p x p matrix has the 
form Cy = C(j-f) m od P where c is the first row of that matrix, 
while a symmetric matrix satisfies = Cji, thus combining 
both conditions yields C(j-() mo & p - C(;_y) m0 d P , which for i = 
(since the first row alone defines the entire matrix) simplifies to 
Cj = C-j mo d p , or cj = Cp-j for j > 0. Therefore, the sequence 
Ci,...,c p _i is a palindrome (and co is an arbitrary bit), and 
only co, ... , cy P i2\+\ are independent. 

The space efficiency becomes more noticeable for rings with 
several layers: an element of A PI {. . .A PL (S?o) is represented as 
nf=i(L/?//2J + 1) elements of M', roughly a fraction 1/2 L of 



the size of a generic circulant matrix of order ]~[f=i Pi over 

Extending the analogy, we define the family of cyclosym- 
metric codes: 

Definition 2. A cyclosymmetric (CS) code over V q is a 
code which admits a block parity-check matrix whose blocks 
correspond to elements of a (multiplayerd) cyclosymmetric 
ring. 

In other words, a cyclosymmetric [n, n - r]-code admits an 
rxn parity-check matrix //with r — rop, n = nop, consisting of 
ro X «o cyclosymmetric blocks of size pxp over some smaller 
ring. The natural advantage of these codes is the compact 
representation of such parity-check matrices. A particularly 
efficient case occurs when ro = 1, that is, H is a simple 
sequence of no cyclosymmetric blocks of size r x r. if H is in 
systematic form, r — p\ ■ ■ ■ pl, and Si = A Pl (. . . A PL (F q ) . . . ), 
then H occupies only (no - 1) nf=i(L/ , //2J + l)lg<7 bits. 

In cryptographic applications, the natural choice is to adopt 
binary codes, i.e. q — 2, and in particular MDPC codes, due 
to the simplicity of the decoding algorithm and the greatly 
reduced parameters that these codes allow for every desired 
security level. A cyclosymmetric MDPC code is, therefore, a 
CS-MDPC code. Moreover, in the same cryptographic context 
we not only propose the use CS-MDPC codes, but also to 
restrict error patterns to the same form as the concatenation 
of first rows of cyclosymmetric matrices, so that these pat- 
terns stand themselves for sequences of cyclosymmetric ring 
elements, as long as this does not affect the security level. 

A disadvantage of a too large number of layers is that, on 
average, each ' 1 ' -bit among the nf=i(L.P;/2J + 1) independent 
bits of each block of H represents about 2 L ' 1 '-bits in the full 
r x r block, rapidly increasing the parity-check matrix density 
and therefore limiting the error correction capability of bit- 
flipping and related decoders. However, small values of L (one 
or two, in some cases possibly even three) yield potentially 
interesting parameters for cryptographic applications. 

A. Security considerations 

An immediate observation on the structure of cyclosymmet- 
ric codes is that one can optimize the Stern attack (9), 132), 
1 39] and its variants by taking advantage of the form of each 
row of the parity-check matrix when performing linear algebra 
operations. Indeed, Stem tries to retrieve a row of low density 
from the dual code; since the first row consists of one element 
followed by a palindrome, and the remaining rows are rotated 
versions thereof, one can in principle reduce in half the overall 
effort incurred by row manipulations. However, this apparent 
improvement may turn out to be ineffective: linear algebra 
operations quickly destroy the palindrome structure within the 
rows, thwarting the optimization. 

Leon's attack ll26l and related ones do not seem to benefit 
at all either, because they already ignore part of each row 
involved in linear algebra operations. Interestingly, a brute 
force attack would be faster than Leon's against cyclosym- 
metric codes because it would need to test only (^p) rather 
than ( p ) elements, yet for any practical choice of parameters 
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that number remains far above the cost of Stern's or similar 
attacks. For example, parameters for which the cost of the best 
known variants of Stern is about 2 80 with block size p = 4800 
and private code density w = 45, the cost of brute force would 

be (Jg) * 2™. 

Similar observation apply to information-set decoding at- 
tacks J9), iflOll : at most, one would expect an improvement by 
a factor of 2 L in the attack cost for L-level CS-MDPC codes 
(recall that, in practice, L < 2). 

There seems to be no essential restriction to the value of 
p, although a prudent choice would seem to be to take prime 
p to avoid the possibility of attacking smaller subcodes. No 
structural attacks along the lines of Faugere et al. ifTHl or 
Leander and Gauthier [42] seems to apply, since the CS- 
MDPC trapdoor is of a statistical rather than algebraic nature. 

Apart from this, CS-MDPC codes appear to inherit most 
if not all of the security properties of the superclass of QC- 
MDPC codes, as indicated in Section Hi-Pi One consequence 
of all these considerations is that, to the best of our knowledge, 
the best attacks against CS-MDPC codes are precisely the best 
attacks against generic QC-MDPC codes. 

IV. Scaling the implementation to embedded platforms 
A. General operation 

We use a representation of sparse matrices with a plain arith- 
metic: both matrix-matrix and matrix-vector products coalesce 
into (vector-vector) cyclic convolution, for which efficient 
algorithms like Karatsuba |23ll and FFT are known. However, 
simple 'textbook' multiplication algorithms, slightly modified 
so as to operate on circulant matrices represented by their first 
row alone, are not only more compact, but at least as fast (and 
often faster) than more advanced counterparts because of the 
sparsity of the arguments. Indeed, the operations that actually 
occur in the Niederreiter cryptosystem always involve at least 
one sparse operand: 

. inversion of a secret, sparse circulant matrix H yielding a 
public, dense matrix K: this is achieved with a carefully 
tuned extended Euclidean algorithm (see Section HV-Bl i. 

• Computation of the public syndrome of a sparse error 
vector. This syndrome is the product of the public, dense 
parity-check matrix K by the sparse vector e. 

• Computation of the private, decodable syndrome s T = He T 
corresponding to a given public syndrome c T = Ke T . This 
is the product H„ _ic T of the sparse secret matrix H„ a -\ 
by the given dense syndrome c T , since K = ff" 1 X H and 
hence //„ _ic T = H„ -iKe T = He T = s T . 

• Additionally, our strategy recovers the decodable syn- 
drome s from a modified but nonzero syndrome s af- 
ter a failed decoding attempt. Such an attempt yields 
an incorrect error vector e of weight not exceeding 
HDDMARGIN(f) satisfying the relation s T = f + He\ 
Thus, recovering s involves the product He T of a sparse 
matrix by a sparse vector. 

Interestingly, the McEliece cryptosystem does involve a prod- 
uct of a dense public generator matrix and a dense ran- 
dom vector in semantically secure constructions like Fujisaki- 
Okamoto [19|. This is further reason to adopt the Niederreiter 
cryptosystem on constrained platforms. 



B. Space-efficient convolutional inverse 

The extended Euclidean algorithm yields a time-efficient 
method to compute the inverse of a circulant matrix H = cir(/i). 
The technique consists of mapping the array h (with compo- 
nents hj, ^ j < r) to a polynomial h{x) = zZo<j<r^j x ' e 
F2M, computing the modular inverse h(x)~ l (mod x r - 1), 
and mapping h(x)~ l back to an array denoted hr x such that 
H 1 = cirC/T 1 ). 

An apparently less widely known property of the extended 
Euclidean algorithm is that it admits a space-efficient imple- 
mentation as well. In its most usual form, when computing 
h~ x mod m the algorithm keeps track of four polynomials 
f,g,b,c e F2M (plus two additional polynomials u,v e F[x] 
that are usually only implicit) related by the constraints / = 
bh + um and g = ch+vm. This suggests a naive implementation 
requiring up to 4r bits of storage for those four polynomials. 
However, polynomials / and c can actually coexist on the 
same storage area, and similarly for polynomials g and b, as 
long as r + 2 bits are available for each of these pairs (totaling 
2r + 4 bits) because, at any step of the algorithm execution, it 
holds that deg(/) + deg(c) < r and deg(g) + deg(b) < r. One 
can easily prove this by Floyd-Hoare logic. 

C. A space-efficient decoder 

The technique of bit-flipping decoding has received a sub- 
stantial amount of attention in the literature since Gallager's 
discovery of LDPC codes d, E6), ED, ED, ED, E2, 
1331 . fljl . 1451 . Il46l . El, EH). However, these are mostly 
concerned with improving the error correction capability rather 
than reducing computational resource requirements. Even 
techniques designed for VLSI like the soft bit-flipping (SBF) 
technique 1141 . lfT31l . which might be potential candidates for 
implementation on the small processors typical of an Internet 
of Things scenario, turn out to take far more ancillary storage 
(namely, still 0(n) for a code of length «) than is typically 
available on those processors. 

It turns out that one can entirely avoid the need for the large 
storage requirements of a bit-flipping decoder. For crypto- 
graphic applications, where the number of introduced errors is 
fixed and known beforehand, the error correction capability is 
not the central concern, as long as the desired security level can 
be attained while fitting the available resources. The variant 
we propose targets precisely this need. We now describe that 
variant, together with a rationale for each decision. The full 
method is summarized as Algorithm Q] 

• On-the-fty counter update: The usual bit-flipping strategy 
requires two passes over the word variables at each step 
of the decoding process, namely, a first pass to determine 
the number of parity errors each variable is involved 
in (thus keeping an array of counters, one for each 
variable), and a second pass to tentatively correct the 
most suspicious variables, which are taken to be those 
whose parity error count is above a certain threshold. 
While the second pass could in principle be avoided by 
adopting a carefully designed data structure singling out 
the positions that do actually exceed the threshold, not 
only would maintaining such a structure be considerably 
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expensive, but the approach is not effective for the better 
part of the decoding process since a large fraction of the 
variables is expected (and experimentally observed) to be 
deemed suspicious, to the effect that this whole approach 
turns out to be easily outperformed by plain counters in 
both storage requirements and processing time. 
We avoid the second pass, the complicated data structure, 
and even the need to keep an array of counters by count- 
ing on-the-fiy the number of parity errors each variables 
is involved in, then deciding immediately whether it has 
to be tentatively corrected, and modifying the syndrome 
accordingly. 

A consequence of this is that the relation between the 
actual parity-error counts evaluated on-the-fiy and the 
bit flipping threshold value is likely to change at each 
such correction, and the decisions that will be taken for 
variables not yet reached may differ from what they would 
be if the counters were computed separately. In fact, the 
parity error threshold becomes known only approximately 
(unless one took the effort to update it by checking all 
variables again each time one of them is corrected), 
but this turns up not to be detrimental to a successful 
correction of all errors; on the contrary, this is empir- 
ically observed to enhance the chance of a successful 
decoding for practical parameters. This can be explained 
by considering that the number of false positives and false 
negatives in the error detection heuristic for bits not yet 
processed is reduced whenever one real error is corrected: 
in other words, there is a better signal-to-noise ratio in 
the bit reliability estimation that would be missed if all 
counters were computed before any actual correction is 
attempted. 

. Onset threshold estimation: As we pointed out, bit flip- 
ping works not only with the exact value of the parity- 
error threshold, but also with a reasonable estimate 
thereof. This holds equally well at the onset of the 
process, so that not even the initial parity-error threshold 
9o needs to be exact. 

Analytically deriving a reasonable initial value, however, 
proves to be rather difficult, but it is easy to bypass 
this problem by adopting an experimental estimate. This 
is done by generating a number (say, of order 10 3 ) of 
codes uniformly at random, then performing for each one 
a number (say, of order 10 3 as well) of decodings of 
uniformly generated error patterns of suitable weight, and 
finally tallying the initial maximum count of parity errors 
influenced by each variable. The empirical estimate of 
the initial parity-error threshold do is then taken to be the 
average of those maximum counts. The standard deviation 
is observed to be fairly small, so this approximation, 
which lies around a fraction 0.7-0.8 d v according to the 
values of r, t, and d v itself, leads to a surprisingly stable 
decoding behavior. 
. Threshold fine tuning: The actual parity-error threshold 
for bit-flipping does not need to be the very maximum 
current parity-error count among all variables. A faster 
variant is achieved by setting the threshold somewhere, 
say 6 parity errors, below that maximum. Experimentally, 



a fine-tuned 6 can improve decoding speed by an order 
of magnitude, so this variant is worthwhile. 
However, not only the decoding speed, but also the 
likeliness of decoding failure increases with growing 5, 
imposing a cutoff at a certain optimal point. As in the case 
of the initial threshold estimate, deriving an analytical 
value for the optimum 5 is a difficult and elusive task. 
We therefore adopt an empirical estimate obtained from 
simulations here as well. 

• Decoding failure handling: Because a large 5 makes 
a decoding failure more likely, the decoder must be 
prepared to decrease the actual 6 and restart the process. 
Fortunately, rewinding the process to recover the original 
is easy to do in-place, as the difference between the 
original syndrome and the current one is the syndrome 
of the partial error pattern constructed by the decoder up 
to the failure detection. 

Decoding failure is usually detected when a maximum 
number of decoding attempts is exceeded. Early detection 
is possible, however, by following the evolution of the 
weight of error pattern being reconstructed. Although 
that weight can temporarily surpass the final weight of 
t errors, in a successful decoding the provisional weight 
is very unlikely to be too large. A simple and sensible 
upper limit obtained from simulations is 3f/2 errors (i.e. 
allow the decoding process to accumulate spurious errors 
up to 50% above the t limit before deciding for failure 
and decreasing S), since no successful decoding has been 
observed to reach as high as this margin before the 
process begins to reduce it and converge to zero errors. 

• Simple supporting algorithms: Sophisticated algorithms 
with a good asymptotic behavior turn out to be an unnec- 
essary hindrance in the context of decoding at practical 
cryptographic ally-oriented parameters . 

Thus, for instance, even though convolution-style algo- 
rithms may seem ideal to handle products of circulant 
matrices, in practice one of the factors is usually so sparse 
that the much simpler approach of just adding together 
a few rows or columns of the other factor as indicated 
by the other factor yields a faster outcome (and smaller 
executable code). 

Likewise, representing the error pattern being recon- 
structed e as an unsorted list of error coordinates yields 
the most compact representation of e and is very efficient 
for cryptographic applications because of the relatively 
small target weight of e, even though this incurs sequen- 
tial searches and updates. 

Taking all this into account, we describe in Algorithm Q] an 
efficient variant of the hard-decision decoding method tailored 
for platforms with highly constrained data and code storage 
and processing power. 

V. Suggested parameters 

For the sake of illustration, sample CS-MDPC parameters 
for typical security levels are listed on Tables HI and iHl 

Although key sizes still fall short of reaching typical val- 
ues for pre-quantum elliptic curve cryptosystems, CS-MDPC 
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Algorithm 1 Efficient hard-decision decoder for constrained 
platforms 

Input: H e K*" (with n = nor), a systematic quasi-cyclic low- 
density parity-check matrix with constant column weight 
d v , represented as an array of no lists of the d v coordinates 
of the nonzero components in each cyclic block of H. 

Input: s e ¥ r 2 , a bit array representing the received syndrome. 

Input: 6, a threshold margin. 

Input: 6q, an estimate of the largest number of unsatisfied 
parity checks among the n variables (bits) of the codeword 
with errors. 

Input: iterBound, a limit on the number of iterations for suc- 
cessful decoding (the heuristic default is iterBound = f), 
Output: e e R, a sparse vector of weight wt(e) < t repre- 
sented as a list of coordinates of its nonzero components 
(but able to hold the coordinates of HDDMARGIN(f) > t 
such coordinates), or upon failure. 
Remark: compute mod remainders via iterated subtraction. 
1: retry <— false 
2: repeat 



3: ew <— > initialize e to no errors 

4: iter <— 

5: 9 <— 0o > initial estimate 

6: repeat 

> Change the bits of the codeword with errors that are 
involved in the largest number of unsatisfied parity checks: 

7: newmax <— > new estimate for 

8: for j <— to n - 1 do 

9: L <— H[\Jlr\] 

10: unsat <— 

li: for z <- to d v - 1 do 

12: if s[(j + L[z]) mod r] = 1 then 

13: unsat <— unsat + 1 

14: end if 

15: end for 

16: newmax <— max{unsat, newmax} 

17: if Mniaf ^0-6 then > try to correct: 

18: if 3q € [0..ew- 1] such that e[o] = j then 

19: ew <— ew — 1, e[g] <— e[ew] 

20: else if ew < HDDMARGIN(f) then 

21: e[ew] <— _/', ew <— ew + 1 

22: else > too many spurious errors introduced 

23: break > to line [31] 

24: end if 

25: for z <— to d v - 1 do > update syndrome: 

26: z' <— (j + L[z]) mod r 

27: s[i] <- -i,S[z'] 

28: end for 

29: end if 

30: end for 

31: 6 <— newmax 

> Iterate until the syndrome is zero (or until a bound on the 
number of iterations is reached) 

32: iter <— iter + 1 

33: until wt(s) = or iter = iterBound 



Algorithm 1 (Continued) 



TABLE I 

CS-MDPC PARAMETERS FOR NlEDERREITER ENCRYPTION ( 1 LAYER; HQ = 2) 



r 


\pk\ (bits) 


d v 


t 


00 


<5 


sec 


4801 


2401 


45 


84 


37 


9 




7839 


3919 


65 


117 


48 


4 


2 112 


9863 


4931 


71 


134 


55 


5 


2 128 


20487 


10243 


105 


198 


75 


8 


2 192 


32771 


16386 


137 


264 


105 


10 


2 256 



Niederreiter encryption keys become competitive with pre- 
quantum RSA and post-quantum, size-optimal NTRU for 
non-legacy security levels, namely, 2 112 onward. We also 
compare the key sizes with the previous smallest code-based 
parameters, namely, those attainable with QC-MDPC codes. 
This can be shown on Table [HI] Besides, as we will see in 
Section I VII the result is still competitive with elliptic curve 
implementations on constrained platforms according to other 
relevant metrics. 

VI. Experimental results 

We assessed the effectiveness of the techniques described 
herein according to the metrics of ROM and RAM us- 
age by implementing the Niederreiter cryptosystem with the 
proposed parameters and decoder on the PIC24FJ32GA002- 
I/SP (32MHz) platform in the C programming language. No 
assembly language optimization has been attempted. 

Mapping from raw plaintext (bit sequences) and error 
patterns is most efficiently achieved (in processing speed, 
data storage and executable code size requirements) with the 
Sendrier technique 11371 . It was natural to adopt the same 
technique choosing CS-MDPC codes uniformly at random. 

The observed program size (i.e. the ROM requirements for 
the deployed system) with the compiler employed is about 
5.8 KiB. Storage (RAM) requirements are about 2.2 KiB 
overall, including the space needed for indices, counters and 
runtime bookkeeping (return addresses, stack management). 



34: if (wt(s) + or ew > t) and 5 > then 

35: 6 <— 6 — 1 > threshold margin was too high 

36: for q <— to ew - 1 do > revert syndrome to 

original form: 
37: j «- e[q] 

38: L <- H[UM] 

39: for z «- to d v - 1 do 

40: i <- (J + L[z]) mod r 

41: s[i] <— -is[i] 

42: end for 

43: end for 

44: retry <— true 

45: end if 

46: until not retry 

47: if wt(i) = and ew < t then 

48: return e, ew 

49: else 

50: return 

51: end if 



TABLE II 

CS-MDPC PARAMETERS FOR NlEDERREITER ENCRYPTION (2 LAYERS; «Q = 2) 



r 




\pk\ (bits) 


d v 


t 


00 


6 


sec 


61x79 = 


4819 


31x40 = 1240 


45 


84 


37 


9 


2 80 


47x167 = 


: 7849 


24 x 84 = 2016 


65 


117 


48 


4 


2 112 


71x139 = 


: 9869 


36x70 = 2520 


71 


134 


55 


5 


2 128 


103x199 = 


: 20497 


52x100 = 5200 


105 


198 


75 


8 


2 192 


73x449 = 


32777 


37x225 = 8325 


137 


264 


105 


10 


2 256 



TABLE III 

Public key and cryptogram size comparison (sizes in bits) 



CS-MDPC 


RSA 


NTRU 


QC-MDPC 


sec 


2016 


2048 


4411 


7836 


2 112 


2520 


3072 


4939 


9856 


2 128 


5200 


7680 


7447 


20480 


2 192 


8325 


15360 


11957 


32768 


2 256 



By contrast, a plain implementation of the bit flipping tech- 
nique would take at least 7.2 KiB for the counters alone, far 
above the 3.8 KiB RAM available on a PIC24FJ32GA002 
microcontroller. For simplicity, we limited the experiments to 
1-layer CS-MDPC codes at the 80-bit security level. 

In comparison, elliptic curve ElGamal encryption at the 
same security level on the ATMegal28L platform using the 
state-of-the-art RELIC library [1 1 demands over 31 KiB ROM 
and 2.1 KiB RAM. 

VII. Conclusion 

We described how to scale code-based cryptosystems to 
platforms with very constrained storage and processing re- 
sources. Central to our proposal is the adoption of quasi- 
cyclic LDPC codes coupled with a storage-efficient algorithm 
for key pair generation, a carefully tailored variant of hard- 
decision decoding, and fine-tuned parameters. The efficiency 
of the result is competitive with traditional cryptosystems like 
those based on elliptic curves. 
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